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ABSTRACT 

Cyclic GMP-AMP (cGAMP) synthase (cGAS) is re- 
cently identified as a cytosolic DNA sensor and 
generates a non-canonical cGAMP that contains 
G(2 ,5 )pA and A(3 ,5 )pG phosphodiester linkages. 
cGAMP activates STING which triggers innate im- 
mune responses in mammals. However, the evolu- 
tionary functions and origins of cGAS and STING 
remain largely elusive. Here, we carried out compre- 
hensive evolutionary analyses of the cGAS-STING 
pathway. Phylogenetic analysis of cGAS and STING 
families showed that their origins could be traced 
back to a choanoflagellate Monosiga brevicollis. 
Modern cGAS and STING may have acquired struc- 
tural features, including zinc-ribbon domain and crit- 
ical amino acid residues for DNA binding in cGAS 
as well as carboxy terminal tail domain for transduc- 
ing signals in STING, only recently in vertebrates. 
In invertebrates, cGAS homologs may not act as 
DNA sensors. Both proteins cooperate extensively, 
have similar evolutionary characteristics, and thus 
may have co-evolved during metazoan evolution. 
cGAS homologs and a prokaryotic dinucleotide cy- 
clase for canonical cGAMP share conserved sec- 
ondary structures and catalytic residues. Therefore, 
non-mammalian cGAS may function as a nucleotidyl- 
transferase and could produce cGAMP and other 
cyclic dinucleotides. Taken together, assembling sig- 
naling components of the cGAS-STING pathway onto 
the eukaryotic evolutionary map illuminates the func- 
tions and origins of this innate immune pathway. 



INTRODUCTION 

Innate immune sensing of microbial infections represents 
a crucial element for host defense. Utilizing germ-line en- 
coded pattern recognition receptors (PRRs), innate immu- 
nity examines extracellular, endosomal and cytosolic com- 
partments for signs of infection and triggers type I in- 
terferon (IFN) induction and other proinflammatory cy- 
tokines when the pathway is activated. DNA has been 
known to stimulate immune responses for more than a cen- 
tury, long before it was shown to be the genetic material. 
Cytosolic DNA of pathogenic bacterial or viral origin, or 
leaking from the nucleus or mitochondria following cell 
stress can be sensed by eukaryotic cells as a danger signal 
or a sign of foreign invasion (1). The accumulation of self- 
DNA can also produce severe autoimmune diseases, such as 
systemic lupus erythematosus. Over the past several years, 
many PRRs for cytosolic DN A have been identified, includ- 
ing DNA-dependent activator of IFN-regulatory factors 
(DAI) (2), RNA polymerase III (3,4), DEAD box polypep- 
tide 41 (DDX41) (5), absent in melanoma 2 (AIM2) (6-8) 
and IFN-inducible protein 16 (IFI16) (9). The presence of 
these multiple PRRs may reflect their functioning in a dis- 
tinct cell-type- or DNA-sequence-specific manner (10), and 
no consensus has emerged until recently. 

Detection of cytosolic DNA activates a stimulator of 
IFN genes (STING, also known as MITA, MPYS, ERIS 
or TMEM173), an endoplasmic reticulum (ER) translocon- 
associated transmembrane protein (11-14). STING in turn 
initiates a cascade of known events by first recruiting and 
activating the cytosolic kinases, IkB kinase (IKK) and 
TANK-binding kinase 1 (TBK1), which phosphorylate and 
activate the transcription factors nuclear factor kB (NF- 
kB) and IFN regulatory factor 3 (IRF3), respectively. NF- 
kB and IRF3 then enter the nucleus and function together 
to induce IFNs and other cytokines, and thereby trigger 
the host immune response (1). STING is a central player 
in the innate immune response to cytosolic nucleic acids 
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(15) . STING could also act as a direct PRR for cyclic dinu- 
cleotides, such as cyclic diguanylate monophosphate (c-di- 
GMP) and cyclic diadenylate monophosphate (c-di-AMP), 
which are conserved signaling molecules produced by bac- 
teria that regulate bacterial motility and biofilm formation 

(16) . 

Recently, cyclic GMP-AMP (cGAMP) synthase (cGAS, 
also known as C6orfl50 and MAB21D1) was reported in 
Homo sapiens (human) and Mus musculus (mouse) as a gen- 
eral (with broad specificity) cytosolic DNA sensor for acti- 
vating type I IFN signaling pathway (17,18). cGAS binds 
DNA and catalyzes the synthesis of cGAMP from adeno- 
sine triphosphate (ATP) and guanosine triphosphate (GTP) 
in the presence of DNA. cGAMP, an endogenous second 
messenger structurally similar to c-di-GMP and c-di-AMP, 
binds and activates STING in the cytoplasm, illuminating 
how STING can stimulate type I IFN pathway in response 
to both cytosolic DNA and cyclic dinucleotides (18). Fur- 
thermore, cGAMP binds STING with an affinity of ~10 
nM, which is significantly stronger than that of c-di-GMP, 
and induces a 'closed' conformation of STING that is im- 
portant to activate the downstream of type I IFN (19-21). 

The details of this cGAS-STING pathway were un- 
covered by a series of structural, biophysical and bio- 
chemical studies (19,21-25). Crystal structures of the nu- 
cleotidyltransferase domain of cGAS (23,26-27) estab- 
lished how cGAS functions as a DNA-sensing enzyme in 
a sequence-independent manner. cGAS interacts with the 
sugar-phosphate backbone along the minor groove of DNA 
by employing a positively charged surface as well as a zinc- 
ribbon domain insertion. A cGAS-DNA complex, harbor- 
ing one cGAS molecule and one DNA molecule, was fo- 
cused on in these studies. In contrast, a 2:2 complex that 
contains dimeric cGAS bound to two DNA molecules, was 
found most recently (24,25). Both of the two DNA binding 
surfaces and the dimer interface are critical to DNA bind- 
ing. More interestingly, the endogenous cGAMP generated 
by cGAS contains a phosphodiester linkage between the 2'- 
OH of GMP and the 5'-phosphate of AMP and another 
between the 3'-OH of AMP and the 5'-phosphate of GMP. 
This specific isomer of cGAMP with 2'-5', 3'-5' linkages 
is termed 2'3'-cGAMP, and is distinguished from conven- 
tional cGAMP (with 3'-5', 3'-5' linkages and termed 3'3'- 
cGAMP) and other cyclic dinucleotides (such as c-di-AMP 
and c-di-GMP) of microbial origin. Both mouse (R231, 
with an Arg residue at site 231) and human (R232) STING 
proteins can be stimulated by 2'3'-cGAMP, 3'3'-cGAMP 
and bacterial 3'3'-c-di-GMP (20,21), but human STING 
with the H232 allele is only responsive to 2'3'-cGAMP (20- 
22). 

Moreover, the Chen lab has provided genetic evidence 
that cGAS as well as STING are essential for the induc- 
tion of type I IFN stimulated by foreign DNA (28). The 
cGAS-knockout mouse is strikingly similar to goldenticket 
mouse (loss of function of STING), both of which are sus- 
ceptible to infection by DNA viruses (11,28). cGAS ho- 
mologs are present in several vertebrate species and have 
structures similar to human or murine cGAS (17,26). In fish 
and pig, STING proteins also act as a mediator for acti- 
vating different IFN genes (29-31). These results indicate 
that cGAS-STING signaling may be the major and non- 



redundant method of DNA recognition in the innate im- 
mune system in mammals and even vertebrates. 

Given the importance of cGAS-STING signaling in 
mammals, it is necessary to explore the evolutionary origins 
of this pathway. In this study, we aimed to present a com- 
prehensive molecular evolutionary analysis of cGAS and 
STING proteins by applying a systematic homolog search 
on all eukaryotic genomes that have been fully sequenced. 
We identified the origins of mouse cGAS and STING in 
the choanoflagellate Monosiga brevicollis, the closest rela- 
tive of metazoans. During metazoan evolution, both cGAS 
and STING were lost in nematodes and flatworms. We also 
utilized several methods to gain novel evolutionary insights 
into cGAS and STING structure-function relationships: (i) 
examining the evolutionary pattern of domain organiza- 
tion of cGAS and STING; (ii) mapping reported function- 
ally critical residues on cGAS and STING through multiple 
sequence alignments across representative species and (iii) 
structural modeling of STING proteins from species other 
than human and obtaining a structural basis for under- 
standing their binding with ligands. Furthermore, we pre- 
sented the evolutionary distribution of the cGAS-STING 
signaling pathway in representative eukaryotic organisms, 
allowing us to explore the evolutionary history of this path- 
way. 

MATERIALS AND METHODS 

Eukaryotic species 

A list of 190 fully sequenced eukaryotic species was de- 
rived from the Database of KEGG Organisms (updated 
on May 20, 2013) (http://www.genome.jp/kegg/catalog/ 
orgJist.html) (32). Chromosome, protein and mRNA se- 
quences of these eukaryotic species were downloaded 
from National Center for Biotechnology Information 
(NCBI) (release 59) (ftp://ftp.ncbi.nih.gov/refseq/release/). 
The genome of Ctenophore Mnemiopsis leidyi that was 
evolved in early metazoans was fully sequenced recently 
(33). The genome data of M. leidyi was downloaded from 
the Mnemiopsis Genome Project Portal (http://research. 
nhgri.nih.gov/mnemiopsis/). The genome data of African 
clawed frog Xenopus laevis (JGI v6.0) was derived from 
Xenbase (ftp://ftp.xenbase.org/pub/Genomics/JGI/Xenla6. 
Of). 

Identification of cGAS and STING homologs 

Two rounds of searches on the protein and genomic se- 
quences were carried out to detect putative cGAS and 
STING homologs in fully sequenced eukaryotic species. 
The illustration of procedures for the cGAS homolog 
searches is displayed in Supplementary Figure SI A. (i) In 
the first round of search based on the protein sequences, 
cGAS homologs were identified using a PSI-BLAST (34) 
search followed by reverse BLASTP. Mouse cGAS proteins 
were initially searched against the eukaryotic proteome data 
set via PSI-BLAST v2.2.26. We set the E-value threshold (- 
e) as 0.001, the E-value threshold for inclusion in the mul- 
tipass model (-h) as 0.002 (default value), and the maxi- 
mum running iterations (-j) as 5. Then, the hits satisfying 
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the thresholds were reversely aligned against the mouse pro- 
teome via BLASTP. The putative homologs were identified 
if the reverse BLASTP best hit was mouse cGAS. (ii) Con- 
sidering that some cGAS homologs could not be found due 
to errors in genome annotation or the presence of some 
genes that have not been annotated, we carried out the sec- 
ond round of search that is based on the genomic sequences. 
For the species without cGAS homologs detected in the 
first round, all the candidate cGAS homologs were aligned 
against the assembled genomic sequences using TBLASTN 
(E-value < 0.001 and coverage on query sequence >45%). 
The identified gene region in a target genome was extended 
by 1 kb on both sides, and then input to Fgenesh+ (HMM 
plus similar protein-based gene prediction) (35) to predict 
the gene structure and protein sequence, (iii) Finally, 52 
cGAS homologs in 45 eukaryotic species were combined 
from the two rounds of searches that are based on protein 
and genomic sequences. A total of 48 putative STING ho- 
mologs from 45 species were obtained in the same way (Sup- 
plementary Figure SIB). Detailed information on cGAS 
and STING homologs are listed in Supplementary Table SI 
and their sequences in FASTA format are available in Sup- 
plementary Data SI. 

Sequence analysis 

The cGAS and STING homologs in platypus Or- 
nithorhynchus anatinus are partial sequences. The cGAS 
homolog in western lowland gorilla lacks ~44% of the 
C-terminal fragment compared with the human sequence. 
To check whether there is any error in sequence annotation 
that leads to the incompleteness of the three proteins, 
we obtained the genomic regions of the three proteins 
and input them into Fgenesh+ by extending 3 kb on 
both sides. The predicted protein sequences are very 
similar to the original ones and are still incomplete. We 
did not include the three partial proteins in sequence 
alignments and construction of phylogenetic trees of cGAS 
and STING gene families. STING is known as an ER 
membrane protein in mammalian cells, containing four 
N-terminal transmembrane domains (TMs). We used 
HMMTOP v2.0 (36) to identify the TMs of the putative 
homologs. HMMTOP results were then checked manually 
and compared with human TMs. Four STING homologs 
in Ixodes scapularis, Tribolium castaneum, Apis mellifera 
and Brachiostoma floridae lacked TM at the N terminus. 
Considering that there may be some errors in sequence 
annotation of the four proteins, we obtained their genomic 
regions by extending 3 kb on both sides and then predicted 
gene structure and protein sequences using Fgenesh+. 
The predicted B. floridae protein sequence has two N- 
terminal TMs detected using HMMTOP and another 
software Phobius (37). Thus, the information of B. floridae 
STING homolog was updated in Supplementary Table SI 
and Supplementary Data SI. The other three predicted 
sequences still lack TM. BLASTP result shows that at 
least 76% coverage on the three TM-lacking sequences 
in /. scapularis, T. castaneum and A. mellifera could be 
aligned with mouse STING protein with E-value < 7e-10 
(Supplementary Table S2), which indicates that the three 
proteins do have similar sequences with mouse STING. 



CTT domains of STING homologs were determined by 
two steps. Each STING homolog was first aligned with the 
human CTT region using BLASTP (E-value < 0.01). Then, 
if the aligned hit on the STING homolog sequence had 
the two conserved residues, Ser 366 and Leu 374 in human 
STING which are important for IRF3 activation (38), the 
STING homolog was considered as containing a CTT 
domain. The multiple sequence alignment of the human 
CTT region on eukaryotic STING homologs was shown 
in Supplementary Figure S2. Amino acid sequences were 
aligned using PROMALS3D (39) and colored according 
to BLOSUM62 score in Jalview 2.8 (40). The secondary 
structure of human cGAS (residues from 164 to 513) was 
derived from the crystal structure (PDB ID: 4KM5). The 
secondary structure elements of all other cGAS homologs, 
Vibrio cholerae DncV and OAS1 proteins were identified 
using Jpred 3 (41). 

Phylogenetic analysis 

A species tree (Figure 1) was constructed for visualization 
of the distribution of cGAS and STING homologs across 
61 metazoan species and the choanoflagellate M, brevicollis. 
The plant Arabidopsis thaliana and fungus S. cerevisiae were 
used as outgroup sequences to root the tree. First, 27 highly 
conserved proteins (Supplementary Table S3) were selected 
from the published list (42,43). Each set of homologs which 
were identified via best bi-directional BLASTP search is 
present in A. thaliana, S. cerevisiae and at least 58 (95% in 
62) metazoan and M. brevicollis proteomes. Second, sepa- 
rate multiple sequence alignments for each set of homologs 
were built using MUSCLE v3.8.31 (44), with the maximum 
number of iterations set to 100. PhyML v3.1 (45) was used 
to derive the maximum likelihood (ML) trees with 100 boot- 
strap replicates by applying the JTT matrix (parameters set 
as -d aa -b 100 -m JTT -f e -v e -a e -quiet). Note that 100 
bootstrapped gene trees were then constructed for each set 
of homologs. Each gene tree was rerooted with A. thaliana 
using the nw_reroot tool from the Newick Utilities pack- 
age vl.6 (46). Third, bootstrapped trees for all genes were 
combined in one file and input to phybase R package vl.3 

(47) . From each set of bootstrap replicates, one STAR tree 

(48) was estimated (47). Based on the multispecies coales- 
cent model, STAR uses average ranks of coalescences to 
estimate species trees from a set of rooted gene trees and 
constructs an NJ tree that is a consistent estimate of the 
species tree topology (48). STAR cannot estimate branch 
lengths. Finally, a consensus tree was constructed from the 
100 bootstrapped STAR trees using the consense program 
in the phylip package v3.69 (49). Tree visualization was car- 
ried out in iTOL v2.1 (50) and then manually labeled and 
colored for clarity. This species tree is in accordance with 
trees published before (43,51-52) and NCBI taxonomy with 
the exception in the branch of mammals. M. musculus and 
Rattus norvegicus should be closer evolutionarily to Macaca 
mulatta than Sus scrofa and Bos Taurus. 

The gene tree for depicting the phylogenetic relationship 
of cGAS gene family (Figure 2A) was constructed using 
PhyML based on the multiple sequence alignment of cGAS 
sequences via MUSCLE. JTT matrix and 100 bootstrap 
replicates were applied. The ML tree of STING gene family 
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Figure 1. Distribution of cGAS and STING homologs across a 
choanoflagellate and 61 metazoan species. A species tree is shown in this 
figure. Branch lengths are not intended to be to scale. Plant Arabidop- 
sis thaliana and fungus Saccharomyces cerevisiae were used as outgroups 
to root the species tree. The species phylogenies were inferred based on 
the summary statistics of coalescence times for 27 multilocus data sets. 
Support values were derived from 100 bootstrap replicates. Taxonomic 
branches are labeled in different colors. Pink, Cnidaria; brown, Nema- 
toda; purple, Arthropoda; cyan, Vertebrata; green, Mammalia. Each leaf 
node is denoted as a standard species name followed by its three-letter 
abbreviation in brackets. cGAS and STING homologs are labeled with 
blue and red rectangles, respectively. The rectangles with oblique lines in- 
dicate that the genes only have partial sequences and so are excluded from 
the construction of gene trees in Figure 2. The species harboring multi- 
ple copies of cGAS and STING homologs are marked with the number of 
copies in blue and red, respectively. Sequences of three vertebrate cGAS 
homologs (marked with blue asterisks) have no insertion of zinc-ribbon 
DNA-binding domain (see Supplementary Text SI for inspection of the 
three proteins). The sequence references, alignments and trees in this fig- 
ure can be obtained from Supplementary Data SI. 



(Figure 2B) was obtained in the same way. The phylogenetic 
tree showing the relationship of cGAS homologs, OAS1 ho- 
mologs and V. cholerae DncV (Supplementary Figure S3) 
was built using PhyML based on the multiple sequence 
alignment of these protein sequences via Promals3D (39). 
All protein sequences, alignments and trees are available in 
Supplementary Data SI. 



Calculation of Ka and Ks 

One homolog pair was aligned at the protein sequence level 
using MUSCLE (44), and the codon multiple alignments of 
mRNA sequences were created from the protein alignments 
using PAL2NAL vl4.0 (53). Ka and Ks values were then 
calculated with ynOO according to Yang and Nielsen (54), 
implemented in PAML v4.7 (55). 

Molecular modeling of STING proteins 

The comparative modeling program MODELLER (56) was 
used to generate models for STING homologs from other 
non-mammalian species. The structures of human STING 
(PDB ID: 4LOH) were used as templates in the struc- 
tural modeling experiments. The 3D models of STING 
from other species were obtained by optimally satisfy- 
ing spatial restraints derived from the sequence alignment 
based on the CLUSTALX results and 3D structures. The 
dimer structure models were assembled by superpositioning 
the two monomer models onto the human STING dimer 
structure architecture bound with cGAMP. All structural 
models were analyzed and minor manual adjustments of 
the modeling solution were made using a graphics pro- 
gram COOT (57). Figures were prepared with PyMOL 
(The PyMOL Molecular Graphics System, Version 1.3, 
Schrodinger, LLC). 

RESULTS 

Presence of cGAS and STING in choanoflagellates and 
metazoans but not in nematodes 

To elucidate the comprehensive evolutionary history of 
cGAS and STING proteins, we determined the distribu- 
tion of these two proteins across 191 fully sequenced eu- 
karyotic species. We used two rounds of searches, which are 
respectively based on the levels of protein and genomic se- 
quences, to detect homologs of mouse cGAS and STING 
(see Materials and Methods, Supplementary Figure SI). 
There are 52 cGAS and 48 STING homologs present in 
44 metazoan species and a unicellular eukaryotic organism, 
choanoflagellate M. brevicollis (Supplementary Table SI). 
M. brevicollis is considered as the closest living relative of 
animals. Except for the choanoflagellate homologs of cGAS 
and STING, we did not find any homolog of cGAS and 
STING in the other branches (fungi, plants or protists) of 
eukaryotes. 

Figure 1 shows the visualization of the distribution of 
cGAS (in blue rectangles) and STING (in red rectangles) 
homologs on the 'life tree' of 61 metazoans and their close 
protozoan relatives. The species tree showing the phyloge- 
netic relationship of metazoan species was inferred from 100 
bootstrapped STAR trees (48) based on ML gene trees of 27 
universal genes (see Materials and Methods). Within meta- 
zoans, homologs of both cGAS and STING are present as 
early as in cnidarians (sea anemone Nematostella vectensis 
and Hydra magnipapillata). Both proteins are distributed 
in all Drosophila species, several non-Drosophila arthro- 
pods and nearly all chordates except for torafugu Takifugu 
rubripes; however, they were lost together in the flatworm 
Schistosoma mansoni and nematodes. Interestingly, the fully 
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Figure 2. ML phylogenetic trees showing the relationships of cGAS (A) or STING (B) homologs. Three proteins with partial sequences, two cGAS 
homologs in platypus Ornithorhynchus anatinus and Gorilla gorilla and one STING homolog in O. anatinus, were not included in the construction of gene 
trees (see Materials and Methods). Bootstrap values equal to or larger than 75 are marked beside each node. Branch lengths indicate the number of amino 
acids substitution per site. The leaf nodes are colored according to the color scheme in Figure 1. Each leaf node is depicted as a three-letter abbreviation 
for species name, followed by cGAS or STING. The correspondence between the abbreviations and the standard species names can be found in Figure 1 
and Supplementary Data SI. The sequence references, alignments and trees in this figure are in Supplementary Data SI, and accession numbers of proteins 
are noted in Supplementary Table SI. 



sequenced species either contain homologs of both cGAS 
and STING or have lost both. 

To explore phylogenetic relationships of cGAS or STING 
homologs, we built a ML gene tree for each family (Fig- 
ure 2). All cGAS-containing species have one cGAS ho- 
molog except for seven species (labeled with copy num- 
bers in blue in Figure 1). H. magnipapillata, T. castaneum, 
Drosophila virilis, Drosophila persimilis, Drosophila pseu- 
doobscura, cephalochordate B. floridae (Florida lancelet) 
and Danio rerio (zebrafish) each of which has two cGAS 
homologs. The phylogenetic tree of the cGAS gene fam- 
ily (Figure 2A) shows that one round of cGAS homolog 
duplication may have occurred before the divergence of D. 
persimilis and D. pseudoobscura, and the cGAS homolog 
might have duplicated once in the other five genomes. It is 



the same for the STING family. H. magnipapillata harbors 
three STING candidates and B. floridae has two STING 
candidates. Figure 2B shows that STING homologs also 
have experienced species-specific gene duplication. 

The Ka/Ks ratio is an important index of functional 
constraints. Ka refers to the number of non-synonymous 
substitutions per non-synonymous site, while Ks represents 
the number of synonymous substitutions per synonymous 
site. The smaller the Ka/Ks ratio is, the stronger the func- 
tional constraints are (58). We listed Ka/Ks ratios of dif- 
ferent proteins, cGAS, STING and TBK1 (a downstream 
protein activated by STING in IFN-pathway induction) 
within four metazoan lineages, namely, mammals (human 
and mouse), fish (D. rerio and Oryzias latipes), insects 
(Drosophila grimshawi and Drosophila willistoni, Drosophila 
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Figure 3. Evolution of functional domains in cGAS proteins. (A) A dia- 
gram of the domain organization of human cGAS. Five red asterisks indi- 
cate key catalytic residues (G212, S213, E225, D227 and D319) within the 
NTase fold of human cGAS. (B) Diagrams of the domain organization of 
cGAS homologs based on phylogeny of metazoans and M. brevicollis. The 
major metazoan groups followed by the representative species in brackets 
include cnidarians, such as sea anemone N. vectensis and H. magnipapil- 
lata, insects, such as D. melanogaster (fruit fly), cephalochordate B.floridae 
and vertebrates. The node of nematoda is in gray indicating that cGAS was 
lost within nematodes. 



melanogaster and Drosophila simulans) and cnidarians (H. 
magnipapillata and N. vectensis) (see Table 1, Materials and 
Methods). The Ka/Ks ratios of all protein pairs are much 
smaller than 1.0, suggesting that these proteins are subject 
to purifying selections (functional constraints). cGAS and 
STING homologs show higher Ka/Ks ratios than TBK1, 
which reveals that TBK1 has undergone strong purifying 
selection, while the functional constraints on cGAS and 
STING, both of which are upstream proteins in the IFN 
induction pathway, may have been relaxed to some extent. 



Evolutionary pattern of domain organization and critical 
residues of cGAS 

Human cGAS is composed of an unstructured and 
poorly conserved N terminus (amino acid residues 1- 
160) and a highly conserved C terminus (160-513) (Fig- 
ure 3A). The C-terminal fragment contains two highly con- 
served domains, nucleotidyltransferase (NTase) core do- 
main (160-330) and Mab21 domain (213-513). In the 
NTase core domain, there are several conserved residues 
associated with active sites within the NTase superfamily: 
hG[G/S]X 9 -i3[D/E]h[D/E]h...h[D/E]h (h indicates a hy- 
drophobic amino acid). The hG[G/S] pattern has a cru- 
cial role in docking substrates within active sites and the 
three conserved aspartate/glutamate residues are involved 
in coordination of divalent ions and activation of accep- 
tor hydroxyl groups on the substrate (59). Inserted into 
the Mab21 region is a zinc-ribbon structural domain (390- 
405) typically defined as H(X 5 )CC(X 6 )C. The Mab21 do- 
main was first identified in the nematode Caenorhabditis 
elegans Mab-21, a cell fate-determining gene (60). Human 
and mouse Mab-21 -like proteins are homologous with the 
nematode Mab-21 (61,62). Although the mouse cGAS is 
also a Mab21 -containing protein, the eukaryotic homologs 
of mouse cGAS identified in this study are more similar 



to mouse cGAS than mouse Mab-21 -like proteins (Supple- 
mentary Table S4). 

Human cGAS shows significant structural similarity to 
the human oligoadenylate synthetase 1 (OAS1) which poly- 
merizes ATP into linear 2'-5'-linkage oligoadenylate upon 
stimulation by dsRNA. The structural similarity is espe- 
cially striking in the catalytic domain fold for complex for- 
mation of cGAS with dsDNA and bound ligands (23,26- 
27,63-64). Furthermore, the first enzyme reported to syn- 
thesize cGAMP is a bacterial dinucleotide cyclase (DncV) 
in V. cholerae, which has no obvious primary sequence 
homology to human cGAS (17,65). Although these three 
kinds of enzymes, cGAS homologs, OAS1 homologs and 
DncV can be classified clearly according to the phyloge- 
netic analysis (Supplementary Figure S3), they all belong 
to the nucleotidyltransferase superfamily and share similar 
patterns of secondary structural elements (Supplementary 
Figure S4). 

To gain novel insights into the evolutionary perspective of 
the relationships between structure and function in cGAS, 
we used two complementary means: (i) placing the domain 
organization of cGAS homologs on the evolutionary map 
of metazoans and M. brevicollis (Figure 3B) and (ii) map- 
ping functionally critical residues of cGAS reported in re- 
cent publications across these species (Figure 4, Supple- 
mentary Table S5). NTase core domain and Mab21 do- 
main are conserved in these species, but M. brevicollis and 
invertebrate cGAS homologs do not have a zinc-ribbon 
domain at the C terminus (Figures 3B and 4). This zinc- 
ribbon domain is conserved among all vertebrate cGAS 
members except for the three homologs in Pan paniscus 
(bonobo), Canis familiaris (dog) and Taeniopygia guttata 
(zebra finch). Lack of a zinc-ribbon insertion in the three 
vertebrates may result from the different versions of genome 
assembly or genome annotation (Supplementary Text SI, 
Supplementary Figure S5 and Supplementary Table S6). 
The vertebrate-specific zinc-ribbon domain of cGAS is not 
in OAS1, nematode Mab-21 and mammalian Mab-21 -like 
proteins (Figure 4). Furthermore, cGAS homologs across 
vertebrates as well as cephalochordate B. floridae have N- 
terminal fragments with an average length of 167 amino 
acids except for chicken Gallus gallus and turkey Melea- 
gris gallopavo. In contrast, cnidarian and insect cGAS ho- 
mologs contain a very short N-terminal fragment, ~70 
amino acids in N. vectensis and fewer than 7 amino acids in 
the other species (Figures 3B and 4). Similarly, OAS1, ne- 
matode Mab-21 and mammalian Mab-21 -like proteins also 
contain very short N-terminal tails (Figure 4). This could 
indicate that the ~167-amino-acid-long N-terminal tail has 
evolved in chordate/vertebrate lineage and seems to be a 
cGAS-specific adaptation. 

We next looked at the conservation of key residues within 
the NTase fold of human cGAS (G212, S213, E225, D227 
and D319) across OAS1, Mab-21 -like proteins, nematode 
Mab-21 and the prokaryotic protein DncV in V. cholerae 
(Figure 4). V. cholerae DncV synthesizes conventional 3'3'- 
cGAMP involved in bacterial chemotaxis and colonization 
while human cGAS synthesizes the specific 2'3'-cGAMP 
involved in sensing dsDNA. In addition, early metazoan 
cGAS and DncV show very similar patterns of secondary 
structural elements (Supplementary Figure S4). Thus, we 



Nucleic Acids Research, 2014, Vol. 42, No. 13 8249 



Table 1. Ka, Ks and sequence divergence of protein homologs between two species 
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Species 


Gene 


Ks a 
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2.14 


0.06 


0.03 


0.09 




ft m 0t /in /i cr/ 1 vt/Ji* a n H ft w ji i i tlfin*? 


cGAS 
STING 


0.14 
0.11 


0.06 
0.04 


0.46 
0.36 


0. 13 
0.09 






TBK1 


0.15 


0.02 


0.11 


0.04 


Cnidarians d 


H. magnipapillata^ and TV vectensis 


cGAS-1 
cGAS-2 


3.96 
3.99 


0.88 
0.87 


0.22 
0.22 


0.73 
0.76 






STING- 1 


3.66 


0.64 


0.17 


0.61 






STING-2 


4.01 


0.74 


0.18 


0.69 






STING-3 


4.00 


1.11 


0.28 


0.72 



a Ks, the number of synonymous substitutions per synonymous site. 

b Ka, the number of non-synonymous substitutions per non-synonymous site. 

c Sequence divergence, one minus identity of BLASTP alignment between two amino acid sequences. 
d TBKl is absent from H. magnipapillata and so is not in the last group of this table. 

e D.rerio has two cGAS homologs (cGAS-1 and -2); H. magnipapillata has two cGAS homologs (cGAS-1 and -2) and three STING homologs (STING-1, 
-2 and -3). 



wondered whether DncV and human cGAS share con- 
served NTase catalytical sites or not. Expectedly, the five 
aforementioned NTase active sites residues are completely 
conserved not only in cGAS homologs but also in DncV. 
OAS1 proteins share all five conserved NTase catalytic 
residues although they contain Asp in place of E225. Con- 
versely, only one of the five residues, E225 is conserved in 
mammalian Mab-21-like proteins and nematode Mab-21, 
although another residue, D227, is conservatively replaced 
with a glutamate. 

It has been recently reported that cGAS in human and 
mouse interact with DNA through two binding sites, form- 
ing a complex composed of dimeric cGAS bound to two 
DNA molecules. Both of the two DNA binding surfaces 
and the dimer interface play a critical role in DNA bind- 
ing (24,25). We checked the conservation of five function- 
ally important amino acid residues involved in 2'3'-cGAMP 
binding, three positively charged residues on the primary 
DNA binding surface and seven critical residues on the sec- 
ond DNA binding surface and dimer interface across eu- 
karyotic cGAS homologs (Supplementary Table S5). Ex- 
cept that S434 of human cGAS is not conserved in mam- 
mals while Y436 is conserved in all species, the other three 
residues (K362, R376 and S378) involved in 2'3'-cGAMP 
binding are completely conserved in a cephalochordate and 



in vertebrates but not in arthropods. Two lysine residues, 
K384 and K411, on the primary DNA binding surface 
are completely conserved in vertebrates but not in early- 
branching species, and the other one, K407, is highly con- 
served across all species except for cnidarians with Arg as 
the corresponding residue. Three critical residues, K347, 
K394 and K398, on the dimer interface are only con- 
served in vertebrates. Two residues, R236 and K254, on 
the second DNA binding surface are conserved in verte- 
brates except for amphibians (Xenopus tropicalis and X. lae- 
vis), but not in early-branching species, while the other two 
amino acids (K327 and R353) are not conserved residues. It 
seems reasonable that the above four residues (R236, K254, 
K327 and R353) are not conserved in vertebrates, because 
only double (R236E/K254E and K254E/K327E) and triple 
(R236E/K254E/K327E) mutations abrogated the ability 
of cGAS to stimulate IFN production (24), and the pos- 
itively charged R353 of human cGAS is replaced by the 
other positively charged amino acid lysine in vertebrates. 
Strictly speaking, five functionally critical residues are not 
conserved in vertebrates while two residues are highly con- 
served in all species; the other eight residues are completely 
conserved in vertebrates but not in early-branching species. 
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Figure 4. Multiple sequence alignment of DncV and the representative sequences of cGAS homologs, OAS1 and Mab-21-like proteins. DncV is a prokary- 
otic enzyme named dinucleotide cyclase in Vibrio cholerae. The listed species include mammals H. sapiens and M. musculus, zebrafish D. rerio, cephalo- 
chordate B. floridae, insect D. melanogaster, nematode Caenorhabditis elegans and cnidarians. If an organism harbors multiple cGAS homologs, only one 
homolog is listed. The Mab21 domain was first identified in the nematode cell fate-determining gene Mab-21 (60). Two human proteins (Mab-21-like 1 
and Mab-21-like 2) and two mouse proteins (Mab-21-like 1 and Mab-21-like 2) are homologous with the nematode Mab-21 (61,62). 



Evolutionary pattern of structural features of STING 

Human STING consists of four N-terminal TMs (amino 
acid residues 21-136), a central c-di-GMP-binding domain 
(CBD, 153-340) and a C-terminal tail (CTT, 340-379) (Fig- 
ure 5A). The CBD containing a dimerization domain (DD, 
155-180), protrudes into the cytoplasm (15). The crys- 
tal structures based on the ~240-amino-acid-long globu- 
lar carboxy- terminal domain (CBD+CTT or CTD, 138— 
379) have been reported to mediate binding to the bacte- 
rial second messenger c-di-GMP (66-70) and recently to 
2'3'-cGAMP produced in mammalian cells (20,21). The 
CTT domain is important for STING to transduce signals 
(38,67). 

Conserved structural domains in STING homologs are 
important for their proper biological roles. Detection of 
novel STING homologs enabled the detailed analysis of 
the evolutionary history of these conserved domains (Fig- 



ure 5B). Human and mouse STING proteins reside exclu- 
sively on the ER membrane (14). Generally, four putative 
TMs (in red rectangles in Figure 5B) exist in protozoan 
M. brevicollis and metazoan STING homologs. Fewer than 
four TMs (in pink rectangles with dashed borders) are in 
arthropods except for jewel wasp Nasonia vitripennis, birds 
and the cephalochordate B. floridae (Figure 5B). Another 
three arthropod STING homologs in I. scapularis, T. cas- 
taneum and A. mellifera lack TM at the N terminus based 
on the current genome data (see Materials and Methods). 
All STING homologs have the conserved CBD and DD do- 
mains. However, the CTT is only observed in vertebrates. 
Overall, the modern STING proteins might have recently 
gained their structural domains during the early evolution 
of vertebrates, although the CTT domain is missing from 
STING homologs in amphibians X. tropicalis and X. lae- 
vis. 
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Figure 5. Evolution of functional domains in STING proteins. (A) A di- 
agram of the domain organization of human STING. (B) Evolutionary 
pattern of the domain organization of STING homologs based on the phy- 
logeny of metazoans and M. brevicollis. The node of nematoda is in gray 
indicating the absence of STING proteins. The TM and CTT domains were 
identified as described in the Materials and Methods. 



As mentioned previously, replacing R232 with histidine 
in human STING was reported to affect its sensitivity par- 
ticularly toward 2'-5'-linkage-containing cGAMP isomers. 
Recent structural studies found that residue 232 in human 
STING is part of the p sheet lid over the binding pocket 
of the STING-cGAMP complex, and the arginine residue 
interacts with the a-phosphate groups of cGAMP (20,21). 
The R232 allele is highly conserved in STING homologs ex- 
cept for O. latipes (Japanese medaka fish, Met 228), H. mag- 
nipapillata (Thr 230) and M. brevicollis (He 234) (Table 2). 
Furthermore, substitutions of several amino acid residues 
within the binding pocket of STING with Ala indicate that 
Y167, R238, Y240, N242, E260 and T263 are involved in 
the recognition of cGAMP isomers (20,21). Y167 and R238 
are completely conserved in all STING homologs. E260 and 
T263 are highly conserved. The corresponding amino acid 
of E260 in mammal S. scrofa (pig) is a glycine, and serine 
replaces T263 in chicken G. gallus, turkey M. gallopavo and 
M. brevicollis. The other two sites Y240 and N242 are only 
fully conserved in mammals. 

To initially obtain a structural basis for understanding 
the binding of STING homologs from the species other 
than human with 2'3'-cGAMP, we generated homology 
models of STING homologs from three species, Japanese 
medaka O. latipes, chicken G gallus and choanoflagel- 
late M. brevicollis, according to the structure of human 
STING (PDB ID: 4LOH) (see Materials and Methods). 
Similar to the human STING structure reported recently 
(20), other STING homologs can also form a dimer (Fig- 
ure 6A, Supplementary Figure S6A and C) and probably 
exhibit a closed conformation with a 2'3'-cGAMP bound. 
In O. latipes STING (Figure 6B), Met 228 substitutes for 
Arg/His 232 of human STING. M228 is on the surface of 
molecule and in the entrance of the ligand binding pocket. 
M228 makes the pocket slightly more hydrophobic and 
would slightly affect ligand binding affinity (possibly a bit 




E260t 
N242tjL^ i/ 




Oryzias latipes 
STING homolog 



^0 

^ Y240a H232a 



Figure 6. Structural modeling of STING from Japanese medaka Oryzias 
latipes binding with 2'3'-cGAMP. (A) Modeled dimer structure of STING 
homolog from O. latipes. One monomer is shown in green and the other 
in cyan. 2'3'-cGAMP is shown as thick bond model. (B) Comparison of 
2'3'-cGAMP binding pockets of human STING with O. latipes. cGAMP 
is shown as thick bond model and amino acids are shown as thin bond 
models in orange (human STING) or cyan (O. latipes homolog). Structural 
modeling of another two non-mammalian STING homologs in chicken 
G. gallus and choanoflagellate M. brevicollis, and their binding with 2'3'- 
cGAMP are displayed in Supplementary Figure S6. 



weaker) in O. latipes STING. M228 could also play a role 
during the open to closed conformational transition. A sub- 
stitution by He 234 in M. brevicollis STING (Supplementary 
Figure S6D) would be similar to M228 in O. latipes STING, 
but Thr in H. magnipapillata makes the pocket and entrance 
slightly polar. His 238 in O. latipes STING (Figure 6B) and 
His 247 in G. gallus STING (Supplementary Figure S6B) 
are located on the corresponding position of N242 in hu- 
man STING. Similar to N242 in human STING, these his- 
tidines can form a hydrogen bond with Y 1 6 1 or Y 1 72 (Y 1 67 
in human STING) which forms a stacking interaction with 
guanine or adenine ring of cGAMP. The corresponding 
residue He in D. willistoni would probably have weak in- 
teraction with the corresponding Tyr. In G. gallus and M. 
brevicollis STING homlogs, Ser 268 and Ser 267 substitutes 
for T263 of human STING. However, these serines can also 
form a hydrogen bond with amino group on guanine ring of 
cGAMP, but are less restricted in ligand binding compared 
with T263 in human STING. In M. brevicollis STING, Met 
242 replaces Y240 in human STING and provides simi- 
lar hydrophobic environment for cGAMP binding. Our re- 
sults showed that even though several key residues in hu- 
man STING have mutations in STING homologs from 
other species, the STING from species other than human 
have structures similar to the human STING. They are 
still expected to exhibit functional levels of binding with 
2'3'-cGAMP, although the binding of 2'3'-cGAMP with 
non-mammalian STING is expected to probably be slightly 
weaker than to human STING. We also tried STING struc- 
tural modeling using 3'3'-cGAMP instead of 2'3'-cGAMP 
and found that the difference in binding between the two 
ligands is minor (data not shown). 

Evolution of proteins in cGAS-STING signaling pathway 

In infected cells, cGAS senses cytosolic dsDNA from di- 
verse microbes and self-DNA in a sequence-independent 
manner and generates 2'3'-cGAMP as an endogenous sec- 
ond messenger, which binds STING to trigger the signal- 
ing pathway that leads to the production of cytokines, such 
as type I IFN (Figure 7A). The activation of STING has 
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Table 2. Conservation analysis of key amino acid residues within the c-di-GMP binding domain (CBD) of human or mouse STING in eukaryotic STING 
family 



Amino acid residues 




Non-conserved in these species 










Non-mammal 






Protist (M. 


Mouse STING 


Human STING 


Mammals 


vertebrates 


Arthropods 


Cnidarians 


brevicollis) 


R231 


R232, H232 a 


c 


Met (M) in 0. 
latipes (Japanese 
medaka) 




Thr (T) in H. 

magnipapillata 


He (I) 


Y166 


Y167 b 












R237 


R238 b 












Y239 


Y240 b 




Phe (F) in A. 

carolinensis (green 
anole), X. 
tropicalis and X. 
laevis 


Phe (F) in most 
arthropods d 




Met (M) 


N241 


N242 b 




His (H) in 0. 

latipes, T. guttata 
(zebra finch) and 
G. gallus 

(chicken); Gin (Q) 
in M. gallopavo 


Asn (N) in Ixodes 
scapularis, T. 
castaneum, A. 
mellifera and N. 
vitripennis\\\e (I) 
in_D. 

vriUistomtfRs (H) 
in the others 


His (H) in N. 

vectensis (sea 
anemone) 


His (H) 


E259 


E260 b 


Gly (G) in S. 
scrofa (pig) 










T262 


T263 b 




Ser (S) in G. gallus 
and M. gallopavo 






Ser (S) 



a R232/H232 in human and R231 in mouse are important for optimal response to 2'-5'-linkage-containing natural or unnatural cGAMP isomers (20,21). 
b Amino acid substitutions of Ala at these positions reduced or abolished cGAMP-isomer-dependent IFN-pathway activation (20). 
c '— ' means the residue is conserved in all species. 

d STING homologs in D. willistoni, D. persimilis, D. pseudoobscura, Tribolium castaneum. Apis mellifera and Nasonia vitripennis have the same amino acid 
Tyr (Y) as human STING. 



been reported to be inhibited by ULK1 associated with 
AMPKot (71). To study how this cGAS-STING-dependent 
type I IFN induction evolved, we placed cGAS, STING 
and their downstream components onto the eukaryotic evo- 
lutionary map (Figure 7B). Fungus S. cerevisiae (budding 
yeast) was added to the representative species group as 
a contrast. AMPKot and ULK1 are present in nearly all 
representative species. cGAS and STING are present in 
choanoflagellate M. brevicollis. But the other four proteins 
(IKK, TBK1, NF-kB and IRF3) involved in the activa- 
tion of this pathway are absent from M. brevicollis and S. 
cerevisiae, suggesting they are probably metazoan-specific 
proteins. IKKe and TBK1 have been reported to group to- 
gether in early metazoans (72). Consistent with this, Figure 
7B indicates that only a single homolog of IKKe or TBK1 is 
present in the poriferan Amphimedon queenslandica, cnidar- 
ians, nematode C. elegans and insect D. melanogaster. At 
the origin of echinoderms, Strongylocentrotus purpuratus 
(purple sea urchin), IKKe and TBK1 were diverged and 
present together in late-branching metazoans. NF-kB is 
widely distributed in all metazoans except nematodes. IRF3 
homologs occurred at the origin of fishes but are absent 
from all three birds gathered in this study, G. gallus, M. gal- 
lopavo and T. guttata. The bird genomes contain another 
member of the IRF family, IRF7, which is also a transcrip- 
tion factor that induces type I IFN and is grouped with 
IRF3 according to their evolutionary history (73). IRF7 is 
present in all vertebrates, which is consistent with previous 
studies (74,75). 



DISCUSSION 

Does cGAS co-evolve with STING in metazoans? 

Using elegant biochemical and genetic experiments, Chen 
et al. identified that mammalian cGAS proteins act as the 
major and nonredundant cytosolic DNA sensor that gen- 
erates the second-messenger product, 2'3'-cGAMP. 2'3'- 
cGAMP then activates STING, which further stimulates 
the downstream signaling pathway that leads to type I IFN 
production (17-18,28). The evolution of cGAS and STING 
was studied in two recent reviews (77,78). Schaap screened 
representative genomes of the major metazoan phyla and 
the choanoflagellate M. brevicollis using a modified best bi- 
directional BLASTP search, and found that STING ho- 
mologs are distributed in all major animal phyla, except for 
porifera, and M. brevicollis. But cGAS is present as early 
as in cephalochordate B. floridae. In the other review, Wu 
and Chen obtained similar results of the origins of cGAS 
and STING. In our study, we applied a systematic and rig- 
orous method to search homologs on all fully sequenced 
eukaryotic genomes, and the understanding of the evolu- 
tionary distribution of both cGAS and STING genes can 
be expanded across eukaryotic species. Besides STING, the 
emergence of cGAS could be traced back to the choanoflag- 
ellate M. brevicollis, which is the closest known relative of 
metazoans (79). 

The cGAS family has several features in common with 
the STING family during metazoan evolution. First, both 
proteins are present early in several simple organisms, 
including cnidarians N. vectensis and H. magnipapillata. 
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Figure 7. cGAS-STING signaling to trigger type I IFN and phylogenetic 
profiles of its molecular components. (A) Overview of cGAS-STING sig- 
naling pathway leading to the production of type I IFN adapted from re- 
cent studies (1,71,76). Cytosolic dsDNA from diverse microbes and self- 
DNA in infected cells are danger signals that are sensed by cGAS in a 
sequence-independent manner. cGAS generates 2'3'-cGAMP as an en- 
dogenous second messenger, which binds STING and induces a conforma- 
tional change in STING. The activated STING recruits TBK1 and IKKe 
kinases, which in turn phosphorylate and activate IRF3 and NF-kB, re- 
spectively. IRF3 and NF-kB then translocate to the nucleus to induce type 
I IFN and other cytokines. Certain bacteria produce c-di-GMP, c-di-AMP 
and 3'3'-cGAMP, which could activate some STING alleles (such as R232 
in human STING). The activation of STING is inhibited by ULK1 (71). 
(B) Distribution of each molecular component (on the right) of the sig- 
naling pathway in a representative group of species (on the top) with fully 
sequenced genomes. The transcription factor IRF7 was added in this part 
because just like IRF3, IRF7 is also considered a master regulator of type 
I IFN induction. Eukaryotic homologs of each molecular component were 
identified in a similar way to those of cGAS and STING (see Materi- 
als and Methods). The presence of a homolog of the molecular compo- 
nent in a particular species is in green and the absence in light gray. The 
species are placed according to the phylogenetic relationship in Figure 1 . 
Most species have been indicated in Figure 4 except for yeast S. cerevisiae, 
sponge Amphimedon queenslandica, echinoderms Strongylocentrotus pur- 
puratus (purple sea urchin), non-mammal vertebrates X. tropicalis (western 
clawed frog) and G. gallus (chicken). 



However, both cGAS and STING were subsequently lost 
in nematodes and flatworms, from which other key com- 
ponents (NF-kB and IRF3) of the cGAS-STING signaling 
pathway are also absent (Figure 7B). It is possible that ne- 
matodes and flatworms may rely on different mechanisms 
to trigger innate immunity signaling. For example, even 
though C. elegans has homologs of Toll-like receptors which 
have well-established roles in innate immunity in mammals, 
these homologs do not function in response to infections in 
nematode (80). 

Second, both cGAS and STING are subject to functional 
constraints that are relaxed to some extent compared with 
TBK1, a signaling component at the downstream end of 
this pathway, suggesting that cGAS and STING may be less 
conservative than TBK1 from insects to mammals (Table 1 ). 

Third, the modern cGAS and STING proteins appear to 
have acquired their domain features early in the evolution of 
vertebrates. For cGAS proteins, the zinc-ribbon domain is 
not present in invertebrate cGAS (Figures 3B and 4). It has 
been reported that the zinc-ribbon domain is functionally 
important for a metal coordination and interaction with the 
major groove of DNA, suggesting that it serves as a molec- 
ular 'ruler' to scale the specificity of cGAS toward dsDN A 



(26-27,76). Additionally, cGAS proteins in cnidarians and 
insects contain very short N-terminal domains (Figures 3B 
and 4). This highly positively charged N-terminal fragment 
of cGAS may play a role in DNA binding because this frag- 
ment can bind an immune stimulatory DNA (ISD, 45 bp) 
(17). Kranzusch et al. suspected that this N-terminal region 
also plays an important role in stabilization or autoinhi- 
bition like other nucleic acid sensors (RIG-I and AIM2) 
(27). Moreover, through investigating the conservation of 
15 amino acid residues of human cGAS, which are critical 
for DNA binding or are involved in 2'3'-cGAMP binding, 
across eukaryotic cGAS homologs, we found that exclud- 
ing five residues that are not conserved in vertebrates, most 
amino acids are completely conserved in vertebrates but 
not in arthropods, cnidarians or choanoflagellate M. brevi- 
collis (Supplementary Table S5). Taken together, evolution- 
ary analysis of key structural domains and critical residues 
shows that invertebrate cGAS homologs may not act as 
a DNA sensor by binding dsDNA. Similarly, the modern 
STING possibly began to acquire its CTT domain within 
the vertebrate lineage (Figure 5). The CTT domain plays an 
essential role in the biological function of STING. In the 
absence of ligand, CTT binds the STING CBD and main- 
tains STING in an inactive and autoinhibited state. Binding 
of ligand relieves the autoinhibited state, exposes the CTT 
and stabilizes the STING dimers in complex with the lig- 
and (67). The exposure of CTT also facilitates the interac- 
tions of CTT with TBK1 to promote activation of IRF3. 
Two residues, S366 and L374, in the CTT domain of human 
STING are important for IRF3 activation (38). Further- 
more, Barber et al. studied the details of phosphorylation 
of STING and uncovered that the phosphorylated S366, 
induced by ULK1, may facilitate STING degradation to 
prevent sustained function (71). The ULKl-induced phos- 
phorylation site S366 is within the CTT domain of human 
STING and conserved in all vertebrates (including fishes) 
except for amphibians X. laevis and X. tropicalis (Supple- 
mentary Figure S2). Amphibian STING homologs lack the 
CTT domain and probably depend on different mechanisms 
for activation and degradation. Thus, the function of cGAS 
and STING in sensing cytosolic DNA to trigger the innate 
immune response is possibly restricted to vertebrates, which 
is consistent with the finding that the IFN system functions 
only in jawed vertebrates (81). 

In summary, cGAS and STING share three important 
evolutionary characteristics as discussed above. Interest- 
ingly, in mammals, cGAS and STING cooperate to bring 
about innate immune signaling. In that regard, cGAS rep- 
resents a new class of cytosolic DNA sensor, while STING 
cooperates with cGAS and functions as a central adaptor 
molecule. The critical link is cGAMP which can activate 
STING to launch the type I IFN induction and also trig- 
ger negative-feedback control of STING activity (17,28,71). 
Therefore, contrary to the conclusion of Schaap that cGAS 
and STING did not evolve together (77), we speculate that 
cGAS has co-evolved with STING during the evolution of 
metazoans. Given that the critical functional zinc-ribbon 
domain and the most amino acid residues important for 
DNA binding in human cGAS are not conserved in early 
cGAS homologs, and even the CTT domain essential for 
STING activation and degradation through the ULK1- 
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induced phosphorylation site S366, is also not conserved in 
early STING homologs, cGAS-STING may possibly play 
other roles in early metazoans rather than activating an in- 
nate immunity response to cytosolic DNA. 

Could cGAS produce cGAMP in early metazoans and M. 

bvevicollisl 

Even though we have suggested that cGAS may not func- 
tion as a DNA sensor in invertebrates, it remains un- 
known whether cGAS in M. brevicollis and early meta- 
zoans can function as a cGAMP synthase. A series of mu- 
tations in mammalian cGAS at each position in the zinc- 
coordination site near the DNA-binding cleft abolished or 
severely impaired the activity of 2'3'-cGAMP synthesis in 
vitro and the production of type I IFN in vivo (26,27). A 
reason for this may be that the ability of cGAS to syn- 
thesize 2'3'-cGAMP requires the pronounced conforma- 
tion changes taking place after the binding of cGAS to ds- 
DNA, which triggers a repositioning of catalytic residues 
in the binding pockets of ATP and GTP (23-24,26,27,63- 
64). Invertebrate cGAS homologs do not contain the zinc- 
ribbon domain (Figures 3B and 4) and four of five amino 
acid residues involved in 2'3'-cGAMP in human cGAS are 
not conserved in arthropods, cnidarians or a choanoflag- 
ellate (Supplementary Table S5), but in the absence of 
any experimental evidence regarding invertebrate cGAS 
reactivity, we can not make an unambiguous conclusion 
that unlike the mammalian cGAS, cGAS homologs in in- 
vertebrates are not 2'3'-cGAMP synthase. However, the 
secondary structure pattern and the NTase-specific motif 
(hG[G/S]X9-13[D/E]h[D/E]h...h[D/E]h) are highly con- 
served in cGAS homologs throughout metazoan evolution, 
in the choanoflagellate M. brevicollis and in the V. cholerae 
DncV (Figure 4 and Supplementary Figure S4). Interest- 
ingly, DncV is capable of synthesizing 3'3'-cGAMP, c-di- 
AMP and c-di-GMP in vitro by incubating with the corre- 
sponding nucleoside triphosphate and a simple assay buffer. 
Therefore, we hypothesize that 'early' cGAS proteins may 
probably have the ability to produce cGAMP or other kinds 
of cyclic dinucleotides acting as a nucleotidyltransferase. 

In a recent review, Schaap suspected that metazoan and 
their protist ancestors of STING could detect cyclic dinu- 
cleotides long before cGAS could synthesize 2'3'-cGAMP, 
through the conservation analysis of 10 residues in human 
STING that are required for binding cyclic dinucleotides 
across homologous sequences (77). Considering that cGAS 
and STING evolved together during animal evolution, the 
conclusion by Schaap probably provide complementary in- 
formation for our hypothesis that 'early' cGAS homologs 
could synthesize cyclic dinucleotides. To provide more com- 
plementary information, we generated homology models 
of STING from three non-mammal species, fish O. latipes 
(Figure 6), chicken G. gallus and the choanoflagellate M. 
brevicollis (Supplementary Figure S6), according to the 
structure of human STING (PDB ID: 4LOH). Similar to 
the reported human STING structure (20), the three non- 
mammalian STING homologs can also form a dimer and 
probably exhibit functional levels of binding with cGAMP, 
although the binding would probably be slightly weaker 
than to human STING. 



If cGAMP synthesis activity of cGAS evolved in M. bre- 
vicollis and early metazoans, another question arose nat- 
urally: Along the trajectory of human evolution, did two 
kinds of cGAMP, canonical 3'3'- and uncommon 2'3'- 
cGAMP, once exist together in the innate immune sys- 
tem? cGAMP and all other cyclic dinucleotides in bac- 
terial cells are linked by 3'-5'-phosphodiester linkage. In 
the mammals, namely, human, mouse and pig, the endoge- 
nous cGAMP has a unique 2'-5'-phosphodiester linkage be- 
tween GMP and AMP (19,21-23,27). Another nucleotidyl- 
transferase, OAS1, can produce 2'-5'-phosphodiester link- 
ages and polymerize ATP into 2'-5'-linked iso-RNA (2'-5'- 
oligoadenylate) instead of 3'-5'-linked RNA under dsRNA 
binding (63). Biochemical and evolutionary analysis con- 
cluded that the first OAS1 protein with 2'-5'-oligoadenylate 
synthesis activity is in Geodia cydonium (marine sponge), 
an earlier lineage than cnidaria in the kingdom metazoa. 

G. cydonium OAS1 can produce both 3'-5' and 2'-5' link- 
ages but predominantly synthesizes the 2'-5' linkage (82,83). 
Nothing is yet known about the production of cGAS in 
invertebrates; however, we could get some clues from the 
STING family considering that cGAS and STING may 
have co-evolved in metazoans. For example, the H232 allele 
of human STING specifically responds to 2'3'-cGAMP but 
loses the ability to respond to 3'3'-cGAMP or bacterial c-di- 
GMP, while the R232 allele can respond to all these cyclic 
dinucleotides. This suggests that the responsiveness to bac- 
terial cyclic dinucleotides was lost under a strong selective 
pressure during human evolution (22,76). Interestingly, the 
H232 STING allele only appears in humans while the R232 
is highly conserved in most metazoans except for O. latipes, 

H. magnipapillata and M. brevicollis (Table 2). Therefore, 
although mammalian cGAS proteins synthesize the non- 
canonical 2'3'-cGAMP on sensing cytosolic DNA, we sus- 
pect that like OAS1, invertebrate cGAS might have had the 
ability to produce both 2'3'-cGAMP and 3'3'-cGAMP. 

CONCLUSION 

To conclude, cGAS and STING are already present in the 
unicellular eukaryotic organism M. brevicollis. But during 
the metazoan evolution that followed, both were lost in ne- 
matodes, flatworms. Because both proteins cooperate exten- 
sively in the stimulation of the innate immune pathway and 
display similar evolutionary characteristics, we hypothesize 
that cGAS and STING have co-evolved in M. brevicollis 
and metazoans. Given the critical functions of cGAS and 
STING in mammals, it is important to study the primitive 
biological functions controlled by cGAS and STING in M. 
brevicollis and early metazoans. Based on the evolutionary 
analysis of their structural organization, zinc-ribbon do- 
main and long N-terminal fragment in cGAS as well as 
STING CTT domain, modern cGAS and STING proteins 
may have gained their functional domains early in the evolu- 
tion of vertebrates. In addition, vertebrate cGAS homologs 
keep most of the amino acids residues in human cGAS that 
are important for DNA binding conserved, whereas inver- 
tebrate cGAS homologs have variations on these residues 
(Supplementary Table S5). Therefore, we hypothesize that 
cGAS and STING do not take part in the innate immunity 
response to cytosolic DNA in invertebrates. However, the 
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high conservation of secondary structures and key active 
residues between cGAS homologs and V. cholerae DncV 
indicates that cGAS may already have acquired the ability 
to synthesize cGAMP in M. brevicollis and early-branching 
metazoans. A question remaining is whether 3'3'- and 2'3'- 
cGAMP once existed together in the evolution of the innate 
immune system? We propose that cGAS might have been 
able to synthesize both 3'3'- and 2'3'-cGAMP during some 
stage of metazoan evolution. Although the specific physi- 
ological and biochemical roles of cGAS and STING ho- 
mologs in invertebrates are remain uncertain, the conserva- 
tion analysis of critical domains, secondary structural ele- 
ments and amino acid residues provide novel insights into 
the relationships between structure and function in both 
proteins. cGAS and STING do not function in isolation; 
they activate cellular innate immune responses to cytoso- 
lic DNA through interactions with downstream molecules. 
The study of cGAS and STING combined with the other 
signaling components in an evolutionary perspective, which 
goes beyond the review by Schaap (77), may provide valu- 
able molecular insights into the functions and origins of this 
type of pathway that initiates type I IFN 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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